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Abstract 

As  we  enter  the  TT'  century  the  art  and  practice  of  warfare  is  radically  changing.  The  US 
has  emerged  as  the  dominant  conventional  military  power  only  to  find  its  adversaries 
working  their  way  out  of  the  box.  The  Defense  Advanced  Research  Agency  Information 
Systems  Office  (DARPA/ISO)  which  is  seeking  new  approaches  to  asymmetric  threat 
modeling,  analysis  and  prediction  sponsored  this  work  as  well  as  several  related 
research  efforts  during  FY  2000.  This  paper  enumerates  some  of  the  main  features  of  the 
asymmetric  environment  and  summarizes  shortfalls  in  our  current  wargame  technology. 
It  is  argued  that  contemporary  developments  in  game  theory  provide  a  flexible  and 
promising  framework  in  which  to  efficiently  model  adversarial  motivation  and  to 
generate  representative  asymmetric  strategies  for  improved  automation  of  behaviors  in 
simulations  and  to  support  Information  Operations  analysis  and  planning.  Genetic 
programming  and  reinforcement  learning  are  suggested  approaches  for  extraction  and 
refinement  of  multi-player  models  from  historical  data. 


Overview 

The  Defense  Advanced  Research  Projects  Agency  Information  Systems  Office  (DARPA/ISO)  under  the 
guidance  of  Larry  Willis  (Wargaming  the  Asymmetric  Environment  Program  Manager)  is  investigating 
technology  needs  and  opportunities  for  asymmetric  wargaming.  During  FY2000  several  efforts  were 
launched  to  extract  predictive  models  from  historical  behavioral  data,  and  to  explore  more  powerful  and 
flexible  modeling  and  analysis  technology  that  could  be  readily  applied  to  asymmetric  conflict.  This 
research  sponsored  by  DARPA  is  presented  in  four  sections: 

1 .  Description  and  nature  of  the  asymmetric  threat 

2.  DoD  wargaming  needs  with  respect  to  the  asymmetric  threat 

3.  Game  theoretic  and  related  approaches  to  modeling  and  wargaming  the  asymmetric  threat 

4.  An  outline  of  selected  R&D  needs  for  practical  and  rapid  application  of  game  theoretic  approaches. 

Asymmetric  Threat:  Nature  and  Challenge 

The  term,  asymmetric,  as  applied  to  asymmetric  threat  or  asymmetric  warfare  has  several  meanings,  but 
they  are  not  often  carefully  distinguished  in  common  parlance.  Fundamentally,  asymmetry  leverages  the 
offensive/defensive  equilibrium  to  the  perpetrator's  perceived  advantage  by  exploiting  defense 
vulnerabilities  or  offense  restraints  with  unconventional,  relatively  inexpensive,  methods.  An  asymmetric 
attack  is  much  less  expensive  to  wage  than  it  is  to  defend  against.  Conversely,  it  is  more  difficult 
(expensive)  to  penetrate  an  asymmetric  defense  tactic  than  it  is  to  set  one  up.  For  example  in  the  UK's 
Kosovo  Lessons  from  the  Crisis*  we  read: 

"The  Kosovo  campaign  was  notable  for  the  wide  use  of  asymmetric  (that  is  to  say  non- 
conventional)  tactics  by  the  Yugoslav/Serbian  forces.  Examples  included:  the  location  of  tanks 
and  other  military  equipment  in  the  middle  of  villages  and  in  other  locations  where  the 
Yugoslav/Serbian  forces  knew  that  our  concern  to  minimise  collateral  damage  would  prevent  us 
from  targeting  them;  at  least  one  case  of  the  use  of  human  shields  was  documented  by  Human 

*  http://www.mod.uk/news/kosovo/lessons/intro.htm,  Geoffrey  Hoon  MP,  et  al. 
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Rights  Watch,  and  the  OSCE  suggest  there  may  have  been  more;  attacks  against  civilians;  and 
extensive  disinformation/propaganda 

In  this  example  the  Yugoslavian  and  Serbian  forces  understood  and  took  advantage  of  NATO  and  Allied 
forces  offensive  restraint  and  were  able  to  win  either  way.  Declining  the  inevitable  collateral  damage  and 
loss  of  innocent  civilian  life  would  preserve  the  Serbian  war  machine;  callously  attacking  the  human 
shielded  targets  would  be  a  double  loss,  first  morally  and  second  by  the  public  relations  crisis  such  an  event 
would  certainly  develop  through  the  inevitable  news  coverage.  The  Serbs  skillfully  controlled  news 
exposure  of  events  during  this  campaign.  They  were  masters  of  camouflage,  decoys,  and  deception  as  well 
(skills  that  were  developed  and  retained  from  WWII).  Tank  turrets  replaced  atop  destroyed  tank  hulls 
repeatedly  fooled  battle  damage  assessment  (BDA)  effectively  multiplying  the  cost  of  the  average  tank  kill. 

Admiral  James  O.  Ellis,  Commander  in  Charge  (CINC)  US  Naval  Forces  in  Europe  (CINCUSNAVEUR), 
Commander  Allied  Forces  Southern  Europe,  and  commander  of  Joint  Task  Force  (JTF)  Noble  Anvil  during 
Operation  Allied  Force  says  that  the  allied  Operations  Plan  (OPLAN)  focused  on  brief,  single-dimension 
combat^.  Consequently,  deception,  diversion  &  feint  opportunities  were  lost;  we  failed  to  adequately  plan 
for  branches  and  sequels.  In  an  age  when  national  decision  making  and  commitment  is  driven  more  by 
public  opinion  than  by  policy  principles  and  leadership  we  are  particularly  vulnerable  to  enemy  information 
operations  (lO)  and  propaganda  which  are  generally  considered  to  be  tools  in  the  asymmetric  war  chest.  As 
a  consequence,  modern  asymmetric  conflict  tends  to  simultaneously  expand  the  dimensionality  of  conflict 
and  amalgamate  concerns,  decisions  and  actions  conventionally  separated  into  strategic,  operational  and 
tactical  categories.  All  of  the  following  slowed  the  Decide- Act  side  of  our  own  Observe-Orient-Decide- 
Act^  (OODA)  loops  and  reduced  our  control  of  the  operational  tempo  (OPSTEMPO); 

•  The  NCA  /  NAC  target  approval  processes 

•  Our  poor  OPSEC  posture  (NATO  and  US) 

•  Our  inability  to  wage  full  lO  campaign 

•  Our  self-suspension  on  cluster  munitions 

•  Our  standards  for  limiting  Collateral  Damage 

•  Our  aversion  to  US  casualties  and  ground  combat 

•  Our  reactive  vs.  proactive  Public  Info  &  Public  Affairs 

Admiral  Ellis  states  that  lO  has  "incredible  potential... must  become  our  asymmetric  point  of  main  effort". 
In  his  judgement,  properly  executed  lO  could  have  halved  the  length  of  the  Operation  Allied  Force 
campaign. 

The  acquisition,  operation  and  maintenance  of  our  Command,  Control,  Communications,  Computers,  and 
Intelligence  (C4I)  infrastructure  opens  yet  another  venue  for  asymmetric  attack.  Many  Department  of 
Defense  (DoD)  and  Intelligence  Community  (IC)  advances  are  accrued  by  embedding  or  integrating 
commercial  off  the  shelf  (COTS)  technology.  As  a  product  of  the  global  economy,  the  origin,  composition 
and  distribution  of  COTS  are  not  entirely  under  US  control.  Substantial  reliance  on  COTS"^  poses  a  risk 
management  challenge:  enemy  exploitation  of  discovered  or  planted  COTS  security  breaches  vs.  state  of 
the  art  capability  and  performance  information  technology  (IT)  at  much  reduced  cost. 


^  Full  Dress  Blue:  A  View  from  the  Top,  Admiral  James  O.  Ellis,  US  Navy,  PowerPoint  Briefing  on  file. 
Sept.  1999. 

^  Complexity  Theory  and  Airpower:  A  New  Paradigm  for  Airpower  in  the  21’“  Century,  Steven  M.  Rinaldi 
in  Complexity,  Global  Politics  and  National  Security  edited  by  David  S.  Alberts  and  Thomas  J.  Czerwinski, 
National  Defense  University  Institute  for  National  Strategic  Studies  (ISBN  1-57906-046-3)  June  1997  285- 
289. 

*  Eliminating  COTS  reliance  would  not  necessarily  eliminate  vulnerabilities,  and  even  secure  systems  are 
subject  to  denial  of  service  attacks.  Homogeneity  and  social  engineering  are  two  primary  vulnerabilities  of 
large-scale  systems.  For  more  details  on  this  see:  http://niap.nist.gov/presentations/Hacking99.ppt 
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While  the  DoD/lC  C4I  infrastructure  presents  a  key  target  for  enemy  exploitation  it  is  dwarfed  by  the 
domestic  commercial  infrastructure.  White  House  National  Security  Council  staff  coordinator  for  security, 
infrastructure  protection  and  counter-terrorism,  Richard  Clarke,  warns  that  several  countries  are  carrying 
out  "electronic  reconnaissance  today  on  our  civilian  infrastructure  computer  networks."  ^  Richard  Perle, 
former  assistant  US  secretary  of  defense  for  international  security  policy  said  US  authorities  had  detected 
intrusions  into  US  networks  from  North  Korea,  and  that  North  Korean  hackers  had  left  behind  a  malicious 
code  designed  for  possible  activation  as  a  kind  of  Trojan  horse. 

Asymmetric  targeting  is  yet  another  dimension.  Terrorism  often  intentionally  strikes  civilian  or  other  non- 
combatant  targets  of  opportunity,  for  the  purpose  of  creating  panic  and  shaking  confidence  in  the 
competence  of  the  ruling  power  or  otherwise  damaging  social  stability  and  welfare.  From  an  account®  of 
the  recent  atrocities  in  Kosovo,  we  read; 

"Forcing  the  refugees  over  the  borders,  NATO  intelligence  experts  believe,  served  another 
purpose:  overwhelming  NATO  troops  stationed  in  Macedonia  with  an  unmanageable  relief 
crisis,  calculating  that  the  task  of  feeding,  housing  and  caring  for  hundreds  of  thousands  of 
refugees  would  consume  the  alliance's  energies  and  divert  it  from  preparing  a  military 
campaign. 

'It  was  the  first  use  of  a  weapon  like  this  in  modern  warfare, '  a  NATO  intelligence  officer  said. 

'It  was  like  sending  the  cattle  against  the  Indians. '" 

Electric  power  generation  and  distribution,  food,  water,  sewage,  banking,  financial  networks, 
communications,  and  transportation  systems  are  all  potential  targets  in  asymmetric  war.  Our  asymmetric 
adversaries  are  not  constrained  by  Judeo-Christian  morality;  in  their  game,  the  end  justifies  the  means. 
Suicide  bombing,  chemical,  biological  and  nuclear  attacks  against  civilian  populations  are  admissible 
options  for  many  terrorist  organizations.  Currency  manipulation  and  trade  imbalances  may  also  be  used 
effectively  in  an  asymmetric  war.  US  adversaries  plan  to  execute  combinations  of  asymmetric  attacks 
anticipating  a  composite  effect  of  sufficient  pain,  damage,  chaos  and  demoralization  to  achieve  their 
objectives. 

Contemporary  Chinese  thought  on  future  warfare  such  as  Qiao  Liang's  and  Wang  Xiangsui's  Unrestricted 
Warfare^  propose  8  principles  of  warfare; 

1 .  Omni  directionality  -  all  factors  bearing  on  the  desired  outcome  of  a  war  are  considered;  military, 
political,  economic,  cultural,  religious,  psychological 

2.  Synchrony  -  a  change  of  emphasis  from  sequencing  and  phasing  to  completion  of  actions  within  the 
same  period  of  time  in  order  to  bring  about  the  greatest  impact,  not  unlike  Col.  John  A.  Warden's 
notion  of  parallel^  warfare 

3.  Limited  Objectiyes  -  choosing  objectiyes  wisely,  not  oyer  reaching  capacity  to  act  effectiyely 

4.  Unlimited  Measures  -  all  means  are  considered,  the  filtering  process  is  limited  only  by  concerns  about 
their  potential  effects  yis  a  yis  the  limited  objectiyes 

5.  Asymmetry  -  refusal  to  confront  the  main  force  of  the  opponent,  seek  out  and  strike  the  weak  spots 
which  will  cause  the  greatest  psychological  shock  to  the  adyersary 


®  Foreign  powers  probing  U.S.  networks:  official,  Jim  Wolf,  Washington  (Reuters),  June  19,  2000. 

®  http;//www.nytimes.com/library/world/europe/052999kosoyo-atrocities. 3.html 
^  Unrestricted  Warfare,  Qiao  Liang  and  Wang  Xiangsui,  Beijing;  PLA  Literature  and  Arts  Publishing 
House,  February  1999.  A  four  part  English  translation  is  ayailable  online  under  the  topic  PRC  National 
Security  and  EST  Issues  at  http;//www.usembassy-china.org.cn/english/sandt 

^  Complexity  Theory  and  Airpower:  A  New  Paradigm  for  Airpower  in  the  2P'  Century,  Steyen  M.  Rinaldi 
in  Complexity,  Global  Politics  and  National  Security  edited  by  Dayid  S.  Alberts  and  Thomas  J.  Czerwinski, 
National  Defense  Uniyersity  Institute  for  National  Strategic  Studies  (ISBN  1-57906-046-3)  June  1997  288- 
290. 
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6.  Minimal  Consumption  -  economy  of  consumption  is  guided  by  maximization  of  means  (unlimited 
measures),  consumption  is  governed  by  the  form  of  combat,  rational  choice  of  means  should  dominate 
thrift 

7.  Multidimensional  Coordination  -  refers  back  to  coordination  of  (1)  and  (2) 

8.  Adjustment  and  Control  of  the  Entire  Process  -  effective  feedback  and  control  for  what  is  expected  to 
become  a  shorter  and  more  dynamic  process. 

War  has  always  been  hell;  unrestricted  war  promises  to  open  the  gates  of  hell  far  and  wide.  Our  enemies 
are  restrained  only  by  utilitarian  self-interest,  are  increasing  their  ability  to  strike  us  and  are  in  many  cases 
composed  of  small  transnational  organizations.  Military  targets  are  avoided  while  soft  civilian  targets  are 
savaged.  Barbaric  atrocities  are  common  place  as  we  enter  the  21'’“  century;  we  witness  the  steady  regress  of 
civilization.  How  do  we  resist,  analyze,  plan  and  train  to  reverse  this  process?  Clearly  the  answer  does  not 
lie  in  military  force  alone,  but  it  is  also  clear  that  military  engagements  in  the  era  of  unrestricted 
(asymmetric)  war  will  require  much  more  sophisticated  and  agile  response  than  is  possible  today. 

Wargames  are  used  for  both  training  and  analysis  as  well  as  in  mission  planning  and  rehearsal.  How  we  can 
efficiently  synthesize  sophisticated  and  agile  decision-making  models  for  wargaming  the  asymmetric 
environment  will  be  the  focus  of  the  rest  of  this  paper.  In  the  next  section  we  review  and  summarize  the 
current  state  of  wargaming  in  the  DoD.  Following  the  overview,  we  introduce  basic  concepts  of  game 
theory  and  outline  some  R&D  directions  for  wargaming  the  asymmetric  threat. 


Overview  of  Wargaming 

Contemporary  US  strategy  and  doctrine  are  based  on  joint  and  coalition  operations.  DoD  operational 
wargames  typically  involve  multi -echelon  (blue)  participants  in  manual  role-playing  with  enemy  (red), 
controller  or  arbitrator  (white),  and  possibly  a  number  of  neutral,  friendly  or  coalition  teams.  Depending  on 
the  purposes  (training,  analysis,  rehearsal,  etc.)  and  size  of  the  wargame,  considerable  background  support 
and  infrastructure  may  be  involved  as  well.  Virtual  simulations  are  used  in  training  and  exercise  wargames 
to  stimulate  the  C^  equipment  of  trainees  actually  in  the  field,  significantly  augmenting  the  training 
environment  with  synthetic  red  or  blue  forces  as  needed.  The  need  for  valid  &  realistic  simulated 
component  behavior  has  long  required  labor  intensive  scenario  development  and  setup  and,  depending  on 
scope,  a  sizeable  support  team  to  steer  or  correct  simulation  behaviors  that  have  gone  off-track  during  the 
course  of  the  run.  High-resolution,  multi -echelon  constructive  simulations  are  particularly  susceptible  to 
aberrant  or  irrational  displays  of  behavior. 


Limitations  of  Current  DoD  Wargames 

Current  wargames  tend  to  focus  on  attrition  modeling  of  symmetric  force  on  force  employment.  Attrition  is 
an  important  factor  and  much  of  our  political  posturing  and  commitment  is  dependent  on  estimates  of 
attrition  for  proposed  military  actions  but  it  is  not  the  only  factor.  A  number  of  shortfalls^  exist  in  the 
current  generation  of  DoD  wargames: 

•  Narrow  Operational  Spectrum  -  Existing  models  do  not  portray  the  full  range  of  military  operations 
such  as  Operations  Other  Than  War  (OOTW)  and  lO 

•  Low  Fidelity  Interaction  -  Modeling  &  simulation  (M&S)  systems  that  simulate  functions  such  as 
transportation,  logistics,  intelligence,  space,  and  special  operations  do  not  interact  with  desired 
resolution  and  fidelity  with  combat  models 


^  Joint  Simulation  System  Operational  Requirements  Document,  Version  3.0,  23  June  1999,  USACOM 
JWFC  M(&S  Development  Branch,  USACOM  Joint  Warfighting  Center,  Fenwick  Rd.,  Bldg.  96,  Fort 
Monroe,  VA  23651-5000,  Attn:  LCDR  Jim  Dick,  USN,  JW543,  page  12. 
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•  Decoupled  Strategic  Effects  -  Existing  simulations  do  not  reflect  the  strategic  effects  of  military 
operations  and  require  excessive  intervention  and  tedious  workarounds  to  inject  effects  of  strategic 
attack 

•  Poor  Adversarial  Automation  -  Existing  simulations  provide  task  organization  and  equipment  for 
foreign  powers  but  authentic  or  effective  strategy  and  tactics  depend  on  manual  role  playing 

•  Labor  Intensive  -  Scenario  development  can  take  many  staff-months  of  effort  and  the  necessity  of 
human  role  players  to  provide  creditable  performance  in  training  exercises  exacerbates  the  problem 

•  Lagging  Visualization  -  DoD  wargames  have  not  kept  up  with  commercial  games  in  terms  of  their 
graphics  and  performance  characteristics  but  remains  focused  on  geographic  and  physical 
environment. 


DoD  Responses  to  Wargaming  Needs 

Emerging  wargames  need  to  incorporate  behaviors  and  combined  effects  of  both  major  and  minor  nation 
states  as  well  as  a  host  of  non-state,  non-governmental  organizations  (NGOs),  transnational  and 
international  terrorist  organizations  operating  in  the  asymmetric  environment  as  well  as  corporate  and 
criminal  entities  with  significant  business  interests.  We  examined  two  operational  requirements  documents 
(ORDs)  for  major  DoD  Joint  Level  Wargames  currently  in  design  and  development.  Joint  Simulation 
System  (JSIMS)  and  Joint  Warfare  System  (JWARS),  and  a  National  Simulation  Center  (NSC)  system 
known  as  SPECTRUM.  Here  is  a  summary; 

•  JSIMS*“  will  be  used  by  unified  commands,  other  joint  organizations,  and  the  Services  for:  training, 
education,  developing  doctrine  and  tactics,  formulating  and  assessing  operational  plans,  and  assessing 
warfighting  situations.  Efficiency  in  operational  and  technical  responsiveness  in  presenting  a  training, 
education,  or  mission  rehearsal  environment  is  essential.  JSIMS  must  reduce  the  personnel  and  time 
required  providing  training,  education,  or  mission  rehearsal  events.  JSIMS  must  also  be  tailorable  in 
order  to: 

•  Provide  a  Joint  Synthetic  Battlespace  (JSB)  representing  all  warfare  domains  and  applicable 
functions  at  a  level  of  resolution  appropriate  for  the  training,  educational,  or  mission  rehearsal 
simulation  event. 

•  Incorporate  the  effects  of  non-military  factors  on  mission  critical  tasks. 

•  Provide  the  capability  to  modify  JSIMS  objects  so  that  new  warfighting  concepts  or  equipment 
can  be  simulated. 

•  JWARS”  will  be  used  by  the  Services,  Combatant  Commands,  Joint  Staff  and  Joint  Task  Eorce 
Commanders  and  Staffs,  to  support  force  assessment,  planning  and  execution,  system  effectiveness 
and  trade  off  analysis,  as  well  as  concept  and  doctrine  development  and  assessment.  In  the  mid-term 
JWARS  will  develop  a  suite  of  models,  including  a  true,  joint  warfare  analysis  model  and  in  the  far- 
term  provide  an  authoritative  representation  for  analysis. 

•  JWARS  will  be  a  constructive  simulation  of  multi-sided  (Blue  including  Coalition,  Red,  and 
Neutral  forces),  joint,  theater-level  wargame  for  analysis 

•  JWARS  will  assist  implementation  of  Joint  Vision  (JV)  2010  by  providing  a  vehicle  to  assess 
current  and  future  military  capabilities  within  the  four  now  standard  operational  concepts; 

•  dominant  maneuver 

•  precision  engagement 

•  focused  logistics 

•  Pull-dimensional  protection. 


'’’ibid,  pages  6-8. 

”  http://www.iointmodels.armv.mil 
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•  SPECTRUM  was  developed  in  the  mid-90s  in  response  to  a  tasking  for  the  NSC  to  develop  a 
simulation  in  support  of  OOTW  training.  The  system  is  used  to  provide  stimulus  for  assessing 
effectiveness  of  simulated  interactions  with  other  forces  and  the  general  population  in  custom 
developed  scenarios. 

•  SPECTRUM  is  used  in  exercises  and  seminars  from  tactical  to  strategic  level  and  in  training 
exercises  by  incorporating  political,  sociological,  economic  and  cultural  wildcards  into  the 
military  decision  making  process. 

•  It  represents  government  and  non-government  agencies,  military,  and  para-military  forces. 

•  A  Regional  Analysis  Model  (RAM)  is  embedded  in  SPECTRUM  that  provides  semi-automated, 
macro-level  interaction  between  political,  economic,  and  social  factors.  The  RAM  represents  the 
region’s  constituents  by  defining  social  groups,  institutions  and  outside  actors. 

•  There  are  21  macro-level  regional  indicators  that  model  a  region  politically,  economically,  and 
sociologically.  A  Delphi  process  was  used  to  determine  the  variables  and  then  assign  a  value  of  1- 
100  to  the  variables. 

•  There  are  28  micro-group  indicators,  many  of  which  are  identical  to  the  macro-level  regional 
indicators.  The  major  difference  in  the  micro  indicators  is  defining  the  group  characteristics  in 
terms  of  communalism,  cohesiveness,  leadership,  ambition,  aggressiveness,  protest  level,  and 
population. 

•  A  third  matrix  is  developed  which  defines  24  factors  for  issues  and  concerns  that  relate  to  the 
population's  satisfaction  and  importance  of  these  24  factors. 

The  effects  of  actions  taken  over  time  are  then  reflected  in  SPECTRUM'S  modeled  population’s  factor 
values.  The  system  is  polled  periodically  during  the  course  of  an  exercise  and  it  emits  modeled  events 
according  to  the  current  value  of  these  factors  thus  leading  to  more  chaotic  and  riotous  behavior  as 
population  satisfaction  levels  decline. 

•  DEXES'^  is  an  integrated  collection  of  dynamics  models  governing  the  time  evolution  of 

•  Economic 

•  Social 

•  Political 

•  Public  health  variables. 

The  Land  Information  Warfare  Activity  (LIWA)  is  currently  supporting  development  and  porting  of 
this  model  to  the  Windows/NT  platform  for  lO  support.  This  model  was  originally  developed  during 
1995  -  1997  in  support  of  OOTW  for  US  Southern  Command  (J5  Plans,  Analysis  and  Simulation 
Division).  Although  we  requested  detailed  documentation  on  the  internal  design  and  operation  of  this 
model  it  was  never  transmitted  to  us  so  the  modeling  technology  reported  here  is  based  only  on 
publicly  available  documentation  and  a  presentation  of  the  model  given  by  Dr.  Ted  Woodcock.  In  the 
words  of  Dr.  Loren  Cobb,  primary  developer  of  DEXES, 

"The  DEXES  family  of  causal  models  brings  to  the  wargame  environment  a  political  -  social  - 
medical  -  cultural  simulation  of  the  effects  of  military,  governmental,  and  NGO  actions  on  a 
society  in  the  aftermath  of  a  major  disaster  or  civil  war.  Equally  important,  the  DEXES  model 
shows  the  effects  of  failures  to  take  action.  The  DEXES  model  of  society  is  deliberately  and 
realistically  unstable,  so  that  incorrect,  omitted,  or  tardy  actions  on  the  part  of  the  players  can 
result  in  negative  consequences,  up  to  and  including  the  sudden  failure  of  the  mission  through 
societal  collapse  or  the  outbreak  of  civil  war". 

DEXES  appears  to  encapsulate  and  automate  many  of  the  features  modeled  in  SPECTRUM  with  the 
addition  of  a  causal  set  of  dynamical  models  governing  the  societal  variables.  It  would  certainly  be 
worth  taking  a  more  detailed  look  at  the  underlying  theory  and  scope  of  each  of  these  models  to 


http://www.aetheling.com/models/MOOTW/DEXES.html 
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determine  if  a  more  accurate  or  comprehensive  model  could  be  attained  by  a  convergence  of  the  two, 
but  this  was  beyond  the  scope  of  our  investigation. 

Discussion  on  Modeiing  the  Asymmetric  Environment 

JSIMS,  JWARS,  SPECTRUM  and  DEXES  are  a  representative  sample  of  DoD  efforts  to  close  gaps  in 
contemporary  wargames.  A  heavy  emphasis  is  placed  on  building  validated  behavior  by  increasing  model 
fidelity  in  the  hopes  that  this  will  assure  validity.  This  approach  ultimately  leads  to  emulation  rather  than 
simulation.  We  are  rich  in  information  on  our  own  forces'  strategy,  doctrine,  tactics,  task  organization, 
equipment  and  weapons.  We  have  sources  such  as  DIA  and  the  National  Ground  Intelligence  Center 
(NGIC)  Land  Capabilities  Spectrum  Model  (LCSM)*^  that  capture  and  profile  the  spectrum  of  adversarial 
military  capabilities.  In  addition,  we  have  allies  who  perform  the  same  assessments  on  foreign  military 
powers  and  US  so  that  authoritative  normalized  profiles  of  military  capability  are  relatively  accessible*''. 

DoD  wargame  developers,  eager  to  rapidly  construct  authentic  and  accurate  representations,  have  readily 
used  this  kind  of  information  to  develop  and  parameterize  their  models.  Although  models  have  been 
implemented  in  a  wide  range  of  coding  styles  with  representations  in  procedurally  coded  task  frames, 
finite-state  machines,  rules  and  constraints  the  resulting  behaviors  are  typically  insensitive  to  opportunities 
and  impending  disasters.  It  appears  that  modeling  priority  has  generally  been  on  task  organization,  weapons 
and  equipment  first,  sensors  and  physical  environment  second,  adjudication  and  attrition  third, 
instrumentation  and  after  action  review  (AAR)  fourth,  with  communications  and  automated  C^  decision 
making  near  the  end.  The  answer  to  keeping  automated  wargames  on  track  and  rapidly  adapting  wargame 
simulations  to  changes  in  scenario,  strategy,  environment,  and  combatants  may  lie  in  a  reverse  of  the 
apparent  modeling  priorities. 

The  more  we  try  to  reconcile  behaviors  by  increasing  resolution  and  fidelity  the  greater  the  knowledge¬ 
engineering  burden  in  development  and  the  smaller  the  range  of  alternatives  that  can  be  explored  during 
execution.  Eor  example,  LIWA  apparently  profiles*^  selected  lO  targets  and  key  decision  makers  by 
collecting  a  wide  spectrum  of  information  including;  education,  politics,  employment,  military  service, 
family,  accessibility,  religion,  political  goals,  motivation,  predisposition,  psychological  disorders,  health, 
special  relationships,  advisors,  international  experience,  foreign  travel,  biases,  ambition,  upbringing,  birth 
place,  age  &  sex,  and  heritage.  Any  individual  behavioral  model  based  on  a  large  number  of  variables  such 
as  would  likely  emerge  from  a  quantitative  articulation  of  this  profile  would  introduce  an  enormous  error 
budget  for  the  overall  model.  Maintaining  such  a  model  over  its  life  cycle  would  be  problematic,  as  the 
interactions  between  the  collected  factors  would  likely  evolve  as  the  factors  themselves  evolve.  Even  if 
predictive  computational  models  currently  existed  that  accounted  for  all  of  these  variables  (and  they  don't) 
scalability  and  maintainability  would  remain  challenges.  Modeling  the  asymmetric  environment  demands  a 
broad  spectrum  of  actors  whose  venues  and  methods  of  attack  are  not  easy  to  adjudicate  and  are  not  limited 
to  attrition. 

Commercial  Off  the  Shelf  (COTS)  Technology 

The  commercial  game  industry  continues  to  push  the  envelope  on  photo-realism  and  scene  rendering  and  is 
moving  into  massively  distributed  game  playing.  The  commercial  game  sector  is  primarily  focused  on 
entertainment  and  market  capture.  Remarkable  progress  has  been  made  in  anatomical  modeling  and 
rendering  of  natural  movement*®  and  gesture.  Emotional  response  is  modeled  in  several  first  person  shooter 
and  role  playing  games.  These  advances  may  be  of  some  value  in  visualization  and  distributed  training  for 
DoD  wargames,  commercial  games  but  have  little  to  offer  in  addressing  the  problems  of;  narrow 


*^  http;//www.fas.org/irp/agencv/armv/tradoc/usaic/mipb/1996-2/schlus.htm 

*''  A  fairly  comprehensive  unclassified  overview  of  US  forces,  doctrine  and  equipment  is  maintained  online 
by  the  Air  War  College  at  http ; //www .  au .  af.  mil/au/awc/awc gate/awc g ate .  htm 
*®  Stakeholder  Modeling  and  Simulation  of  Asymmetric  Environments  (Draft  Report),  The  MITRE 
Corporation,  Lee  Scott  Ehrhart,  Ph.D.  January  2000,  12. 

*®  See  for  example  http;//www.dailvradar.com/features/game  feature  page  539  l.html, 
http;//www. eai.com/products/iack/classiciack.html 

2000  The  MITRE  Corporation.  ALL  RIGHTS  RESERVED. 

7 


operational  spectrum,  de-coupled  strategic  effects,  poor  adversarial  automation,  and  labor  intensive 
scenario  development. 

Some  special  purpose  COTS  products  do  exist  that  directly  address  coupling  of  strategic  effects.  For 
example  SIAM*^,  a  tool  for  building  influence  diagrams  (networks  of  causal,  or  more  accurately, 
probabilistic  links  between  decision  variables  and  utility  measures  based  on  user  provided  expert  opinion). 
Underlying  SIAM'S  visual  influence  diagram  a  Bayesian  network  enforces  consistency  and  propagates  the 
effects  of  updated  influences  and  event  outcomes.  Similar  Bayesian  or  belief  network  tools  such  as 
Hugin**,  Genie  and  Smile*®  also  exist  but  SIAM  has  actually  been  put  into  operational  use  for  military 
intelligence  applications. 

There  are  emerging  third  party  vendors^**’  ^*  who  supply,  or  very  soon  intend  to  market,  building  block 
technologies  for  the  artificial  intelligence  (AI)  components  that  could  assist  in  adversarial  automation.  But 
the  game  developer's  market  does  not  show  any  sign  of  moving  toward  a  decision  analytic  framework.  A 
straw  poll  conducted  on  a  game  developer's  web-page^^  reveals  the  top  ten  AI  modules  most  valued  by 
game  developers  to  be  in  the  areas  of  group  movement  control  (path-finding,  tactics,  and  social  flocking), 
A*-search,  multi-agent  control  (reactive,  rule-based  expert  system,  finite-state  machine,  and  fuzzy 
inference).  COTS  game  developers  are  interested  in  realistic  graphics  and  convincing  but  efficient 
behavior.  Exploration  of  alternatives  and  intelligent  rational  behavior  fall  pretty  low  on  the  list  of  market 
demands. 

High  Power  Knowledge-Based  Technology 

Do  the  answers  lie  in  continued  investment  in  knowledge-intensive  technology  such  as  planning, 
scheduling,  high-powered  knowledge  bases?  Perhaps  in  the  long  run,  but  these  technologies  are  dogged  by 
two  persistent  challenges:  (1)  they  demand  a  heavy  upfront  investment  in  knowledge-engineering  and 
development  of  domain  specific  heuristics  for  their  strength  and  (2)  once  constructed,  knowledge-based 
for  simulations,  such  as  SOAR^^  are  computationally  challenging  and  do  not  scale  well. 

New  DARPA  efforts  in  rapid  knowledge  base  formation  promise  to  mitigate  the  knowledge-engineering 
bottleneck  (1)  by  development  of  tools  and  methodologies  for  declarative  knowledge  representation  and 
capture,  and  ontology  design,  development,  and  reuse,  but  these  tools  are  nascent.  Given  the  current  state  of 
knowledge-engineering  technology,  a  new  modeling  approach  may  be  in  order:  a  shift  of  emphasis  from 
engineering  doctrinal  procedures  and  sophisticated  symbolic  reasoning  toward  simpler  decision 
mechanisms  based  on  maximization  of  expected  utility.  This  approach  has  been  used  in  econometrics  and 
social  sciences  for  decades  and  as  the  foundation  for  most  military  strategic  analyses,  yet  it  is  rarely  seen  in 
military  simulations  as  a  driver  for  behavior.  However  researchers  are  beginning  to  explore  the  efficacy  of 
utility  optimization  as  a  practical  basis  for  behavior  modeling  in  computer  generated  forces.  Booker^'*’^^,  et 
al.  have  demonstrated  improved  robustness  (more  effective  and  responsive)  control  of  tactical  movement  in 
ModSAF  with  a  computationally  efficient  control  theoretic  decision  formulation  known  as  the  DRK- 


*®  See  http://www.dodccrp.Org/Proceedings/DOCS/wcd00000/wcd000c2.htm  or  for  more  details. 

Planning  With  Influence  Net  Modeling:  An  Architecture  for  Distributed,  Real-Tune  Collaboration,  Julie  A. 
Rosen,  Ph.D.,  Wayne  L.  Smith  ,  Edward  S.  Smith  and  Michael  A.  Maldony,  Jr.  66*  Military  Operations 
Research  Society  Symposium,  23-25  June  1998 
**  http://www.hugin.dk 
*®  http://www2.sis.pitt.edu/~genie 

^**  A  French  group,  MASA  has  several  potential  application  areas  outlined  for  its  products  DirectIA  and 
NetworkEvolver  (see  http://www.animaths.com ) 

^*  http://www.cse.buffalo.edU/~goetz/AI/API/gaibodv.html#strips 

http://www.cse.buffalo.edU/~goetz/AI/API/gaibodv.html#strips 

http://bigfoot.eecs.umich.edu/~soar/main.html 

^"*  Toward  Motivational  Control  of  Tactical  Behaviors,  Lashon  B.  Booker,  Ph.D.,  Carl  D.  Burke,  Gregory 
M.  Whittaker  Proceedings  of  the  7*  Computer  Generated  Forces  and  Behavior  Representation,  May  1998. 

Motivational  Control  of  Tactical  Behaviors:  Interim  Report,  Lashon  B.  Booker,  Ph.D.,  Carl  D.  Burke, 
James  D.  Hughes,  Proceedings  of  the  8*  Computer  Generated  Forces  and  Behavior  Representation,  May 
1999  (http://www.sisostds.org/cgf%2Dbr/8th/docs/papers/8th%2Dcgf%2D058.doc). 

2000  The  MITRE  Corporation.  ALL  RIGHTS  RESERVED. 

8 


motivational^®  model.  Control  theory  per  se  is  limited  to  a  single  controller's  perspective.  The  multi-player 
perspective  is  fundamentally  game  theory.  In  both  control  and  game  theory,  action  selection  is  based  on 
environmental  values  either  dynamically  sensed  or  known  a  priori.  When  used  to  model  the  actions  of  an 
individual  decision-maker,  the  focus  is  on  motivation  or  volitional  rather  than  reasoning  or  logical 
causation.  We  avoid  the  burden  of  constructing  an  explicit  reasoning  process  at  the  cost  of  creating  a 
preference  relation  between  modeled  actions  available  to  the  decision-maker  and  his  perceived 
environmental  state.  There  are  a  number  of  benefits  to  be  gained  by  simplifying  the  model  structure  to  that 
proposed  in  the  game  theoretic  literature. 

1 .  Operational  cost  is  substantially  reduced  because  computation,  synthesis  and  selection  of 
strategy  can  largely  be  done  ahead  of  runtime 

2.  As  the  model  evolves  to  cover  additional  states  of  the  environment  or  as  actors  and  actions 
are  added,  knowledge-engineering  consists  of  extending  and  refining  the  preference  relations 
(numerical  payoff  value  or  utility  measure)  in  the  modeled  utility  function 

3.  Integration  of  any  modeled  player  and  consequent  behavior  changes  is  completed  upon 
solution  of  the  game,  a  purely  computational  process. 

Classical  A1  approaches  to  action  control  or  planning  depend  on  calculation  and  maintenance  of  many 
distinct  pre-conditions  and  effects  while  game  theory  reduces  action  effects  to  payoff  and  focuses  on 
optimizing  expected  payoff  It  is  difficult  to  know  in  a  complex  set  of  productions  if  something  hasn't 
happened  because  of  a  knowledge-engineering  oversight,  because  of  a  logical  conflict,  or  simply  because 
of  an  unmet  set  of  conditions.  Rule-based  systems  tend  to  channel  behavior  along  preconceived  lines  of 
attack  that  sooner  or  later  become  very  predictable.  As  we  shall  see,  game  theoretic  solutions  give  rise  to 
rational  yet  non-deterministic  behavior  which  can  lead  to  a  much  broader  exploration  of  alternative  courses 
of  action  then  is  common  practice  today. 

Moving  toward  a  game  theoretic  framework  entails  a  lessening  of  model  emphasis  on  detailed  how-to 
information  and  more  emphasis  on  how  much  and  when.  In  this  framework,  every  player  has  a  set  of 
options  and  every  combination  of  player  options  has  a  value  for  each  player.  By  reducing  the  effects  of 
options  to  their  value  or  utility  and  by  modeling  adversarial  decision  making  as  utilitarian  self-interest  a 
potentially  enormous  knowledge-engineering  burden  is  eliminated.  This  payoff  or  utility  is  constant  and 
known  in  advance  in  classical  game  theory  but  extensions  such  as  Bayesian  game  theory  and  stochastic 
game  theory  have  penetrated  the  world  of  incomplete  information  and  dynamic  payoff  states. 

Recent  efforts  to  apply  a  game  theoretic  perspective  to  wargame  simulation  have  been  focused  at  the 
tactical  level.  Booker's  work  mentioned  above  and  some  work  by  Katz  and  Butler^^  investigating  game 
theory  for  focused  on  ModSAF  as  the  target.  Earlier  work  by  Shubik  and  Weber  recalls  so-called 
Colonel  Blotto  Games^^  that  address  operational  allocation  of  forces  and  extends  this  differential  game 
theoretic  approach  to  strategic  issues  of  complentarity  between  targets  and  cost  tradeoffs  between  system 
defense  and  asset  hardening.  A  very  recent  DARPA  Advanced  Simulation  Technology  Thrust  (ASTT) 
proof  of  concept  study^®  has  successfully  applied  an  extensive  form  game  representation  to  a  ground 
combat  scenario  by  combining  dynamic  programming  and  game  theoretic  techniques. 

In  the  following  section  we  introduce  some  of  the  basic  elements  of  the  game  theoretic  framework  and 
illustrate  the  simplest  methods  of  game  solution  in  order  to  introduce  the  concept  of  equilibria.  So-called 
Zero-Sum  games  model  pure  conflict,  one  player's  gain  is  another  player's  loss.  Most  wargames  fall  into 


On  the  Fitness  of  Behaviour  Sequences,  R.  Sibly  and  D.  McFarland,  (1976)  American  Naturalist,  1 10, 
601-617. 

Game  Commander-Applying  an  Architecture  of  Game  Theory  an  Tree  Look  Ahead  to  the  Command  and 
Control  Process,  A.  Katz,  B.  Butler  (found  in  transactions  of  the  IEEE  1994). 

Systems  Defense  Games:  Colonel  Blotto,  Command  and  Control,  Martin  Shubik,  Robert  James  Weber, 
Cowles  Foundation  Paper  521,  Naval  Research  Logistics  Quarterly,  28(2),  1981. 
http://cowles.econ.vale.edU/P/cp/p05a/p0521.pdf 

Development  of  Command  Decision  Making  Algorithms  for  Joint  Simulations  Integrated  Final  Report, 
October  1999,  Sponsored  by  STRICOM  through  NAWC  TSD  on  Contract  Number  N61339-97-C-0038. 
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this  genre,  but  there  are  many  cases  in  asymmetric  conflict  where  the  objective  is  to  manipulate  or  form  an 
alliance  or  coalition.  Shapely  and  others  as  long  ago  as  1953  were  laying  the  theoretical  foundations  of 
coalition  games  and  cooperative  game  theory. 

With  our  dependency  on  information  systems  and  information  operations  in  the  conduct  of  OOTW  (Peace 
Making,  Peace  Keeping,  Humanitarian  Relief)  there  is  a  natural  match  between  game  theory  and  the 
asymmetric  environment.  After  an  introduction  to  the  basic  concepts  of  classical  game  theory  and  a  brief 
indication  of  the  more  advanced  recent  developments,  we  will  address  some  of  the  shortfalls  and  challenges 
of  applied  game  theory  in  analysis  and  prediction. 

Game  Theory  Fundamentals 

In  a  finite  game  we  have  an  enumeration  of  N  players  { i  I  i  £  N } ,  each  of  whom  has  available  some  finite 
collection  of  Jj  discreet  actions,  denoted  as  the  list  A;  =  { ai,  a2,  ...  aj;},  where  #Ai  (cardinality  of  A;)  =  Jj. 

We  define  as  the  set  of  all  possible  plays  of  the  game  generated  by  the  N-fold  Cartesian  product  of  the  A;, 
n={(pi,  P2,  ...  Pn)  Ai  X  A2  X  ...  An}.  For  each  player  i  we  are  also  given  a  utility  function  Uj;  11  R.  We 
call  the  value,  Ui  (w)  for  well,  the  payoff  to  player  i  for  the  play  w.  Players  must  simultaneously  choose  an 
action  from  among  their  respective  options  at  each  play  of  the  game.  Classical  game  theory  assumes  that 
each  player  knows  every  other's  action  set  and  utility  function  as  well  as  his  own  (complete  information).  It 
is  also  assumed  that  all  players  are  rational,  that  is,  they  choose  actions  in  order  to  optimize  their  expected 
utility.  A  player's  rule  for  choosing  among  his  possible  actions  is  called  a  strategy.  (In  following  sections 
we  discuss  modern  developments  of  game  theory  that  handle  incomplete  or  uncertain  information.) 

A  pure  strategy  continually  chooses  the  same  action  at  every  play  of  the  game;  a  mixed  strategy  non- 
deterministically  chooses  an  action  according  to  a  probability  distribution  over  all  possible  actions.  A  pure 
strategy  is  therefore  just  a  special  case  of  the  mixed  strategy.  In  either  case  a  strategy  for  player  k  may  be 
represented  as  a  vector  Sj;  =  <^i,  E,2,  ...  ^jk>  in  the  face  of  the  Jk  dimensional  unit  simplex  ak  where  each  E,i 
is  the  probability  that  action  aj  e  Ak  will  be  played.  The  simplex  constrains  the  components  of  Sk  so  that  we 
have  all  s  0  and  =  1 .  The  Jk  dimensional  manifold  S  =  ai  x  a2  ...  x  an  is  global  strategy  space  for 
a  given  finite  game,  and  we  denote  the  Jj  dimensional  sub-manifold  Si  as  the  restriction  of  S  to  ai.  A 
solution  to  a  game  is  a  global  strategy  S  e  S  that  optimizes  the  expected  utility  for  each  player  i  on  Sj. 

Game  theoretic  algorithms  generate  strategies  (prescriptions  for  option  choices)  that  simultaneously 
account  for  all  players  by  finding  equilibrium  points  in  the  strategy  space,  that  is,  points  where  no  player 
can  benefit  by  changing  strategy  assuming  all  other  players  hold  to  their  equilibrium  strategy.  There  are 
actually  several  variations  on  the  theme  of  equilibrium  for  example: 

•  Dominant  Strategy  -  For  any  player  i,  strategy  Si  dominates  if  its  corresponding  expected  value  is 
greater  than  or  equal  to  the  expected  value  of  any  other  strategy  for  player  i 

•  Nash  Equilibrium  -  For  any  player  i,  any  change  from  Nash  equilibrium  strategy  Si  will  yield  lower 
expected  value  for  player  i  given  that  all  other  players  maintain  their  respective  Nash  equilibrium 
strategies 

•  Strong  Pareto  Optimality  -  There  is  no  other  equilibrium  strategy  that  gives  any  player  a  higher 
expected  payoff 

•  Weak  Pareto  Optimality  -  There  is  no  other  equilibrium  strategy  that  gives  all  players  equal  or 
greater  expected  payoff 

The  variations  on  equilibrium  characterization  reveal  the  game  theoretic  focus  on  finding  the  best  possible 
decision  strategy.  Table  1  (below)  illustrates  a  simple  example  of  dominant  strategy  for  the  case  of  a  two- 
player  game  where  each  player  has  two  options.  The  values  are  indicated  by  pairs  (value  to  player  A,  value 
to  player  B).  By  inspection,  we  see  that  option  1  is  a  dominant  strategy  for  player  A,  since  no  matter  what 
option  player  B  chooses  the  outcome  is  better  than  for  option  2.  From  player  B's  perspective  option  1  is 
also  a  dominant  strategy. 
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Player  B  Option  1 

Player  B  Option  2 

Player  A  Option  1 

(4,  1) 

(2,  0) 

Player  A  Option  2 

(1,  4) 

(0,  2) 

Table  1.  Dominant  Strategy  Saddle  Point  (Option  1,  Optlonl) 

The  coincidence  of  the  dominant  strategies  leads  to  a  saddle-point,  or  equilibrium  point  in  strategy  space. 
Both  players,  if  they  are  motivated  by  self-gain,  and  are  cognizant  of  the  outcomes,  will  play  the  pure 
strategy  of  option  1  every  time  they  encounter  this  game.  The  game  has  an  equilibrium  value  of  4  for  player 
1  and  1  for  player  2.  Obviously,  not  all  games  are  this  simple,  table  2  illustrates  a  variation  on  the  theme 
with  different  payoff  values. 


Player  B  Option  1 

Player  B  Option  2 

Player  A  Option  1 

(-3,3) 

(4,-4) 

Player  A  Option  2 

(2,-2) 

(-5,5) 

Table  2.  Sample  game  requiring  a  mixed  strategy. 

In  this  case,  the  equilibrium  strategy  for  player  A  is  a  mixed  strategy  choosing  option  1  50%  of  the  time 
and  option  2  the  remaining  50%.  Player  b  divides  his  strategy  9/14  (64.29%)  for  option  1  and  5/14 
(35.71%)  for  option  2.  The  game  value  is  -1/2  for  player  1,  and  h-1/2  for  player  2.  These  two  simple 
examples  illustrate  the  normal  form,  sometimes  called  strategic  form,  of  game  representation.  The  game  in 
table  2  is  also  a  zero-sum  game,  that  is,  the  payoffs  add  to  zero  in  every  outcome;  one  player's  gain  is 
another  player's  loss.  In  two-  player  zero-sum  games  solution  of  the  game  can  be  found  using  the  maxi-min 
algorithm. 

The  maxi-min  algorithm  calculates  the  maximum  of  the  minimum  utility  player  1  can  guarantee  given  the 
value  of  his  options.  This  is  better  described  with  a  calculation  and  corresponding  graph.  We  must  calculate 
our  utility  as  a  function  of  our  strategy  for  each  pure  option  of  our  opponent,  player  2.  We  know  we  must 
play  some  combination  of  our  options,  either  a  pure  strategy  for  option  1  or  a  pure  strategy  for  option  2  or 
some  mixture  of  the  two.  We  express  our  unknown  strategy  as  a  frequency  vector  (or  convex  combination) 
of  our  options,  ((1-p),  p). 


Player  2 
Chooses 

Player  1  Strategy  is:  (1-p) 
for  option  1,  p  for  option  2 

Expected 

Payoff  is: 

Option  1 

(1-p)*  (-3)  +  p  (2) 

5p-3 

Option  2 

(1-p)*  (4)  +  p  (-5) 

-9p+4 

Table  3.  Calculating  Maxl-Mln  Payoff  for  Player  1 

Figure  1  graphs  player  I's  payoff  for  each  of  player  2's  options  and  indicates  an  intersection  at  the  point 
where: 


5p-3  =  -9p+4 
p  =  .5 
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strategy  profiles  against  each  of  player  2's  options 
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Figure  2,  Mcai-Min  [( l-p)Optionl+  (p)Option2]  sets 
p=0.5,  player  1  game  value  =  -0.5 


This  is  the  value  of  p  that  maximizes  the  minimum  expected  utility  for  player  1 .  If  player  1  chooses  to  play 
option  1  less  than  50%  of  the  time,  then  player  2  can  profitably  respond  by  increasing  his  frequency  for  his 
own  option  1  to  something  greater  than  9/14  and  correspondingly  decreasing  his  play  of  option  2  to  less 
than  5/14.  (The  calculation  shown  above  and  graphed  in  figure  2  is  done  in  a  completely  analogous  way  for 
player  2.)  In  a  zero-sum  (or  generally  any  constant  sum)  game,  the  payoff  is  sometimes  represented  as  a 
single  value  (implicitly  player  1  's  payoff)  for  each  entry  in  the  payoff  matrix  instead  of  the  vector  of  values 
used  in  our  example  matrices  (Tables  1  and  2).  In  this  case,  player  1  is  sometimes  referred  to  as  the 
maximizing  player,  and  player  2  is  the  minimizing  player.  The  standard  approach  for  solving  two-player 
constant  sum  games  is  to  look  for  dominating  pure  strategies  and  if  none  exist  apply  the  Maxi-Min 
algorithm  for  the  maximizing  player  and  the  Mini-Max  algorithm  for  the  minimizing  player.  It  is  possible 
for  an  arbitrary  game  to  have  more  than  one  or  even  an  infinite  number  of  equilibria.  The  simple  example 
given  here  is  only  intended  to  give  an  intuition  of  the  nature  of  equilibrium  points.  Since  equilibrium 
strategies  are  computed  by  considering  the  objectives  of  all  players  simultaneously;  there  is  no  problem  of 
out-guessing  the  opponents  estimate  of  your  own  behavior  before  you  can  estimate  his,  as  is  the  case  when 
using  purely  decision  theoretic  analysis. 

We  have  used  what  is  called  the  normal  form  or  strategic  form  game  representation  in  our  examples.  When 
it  is  necessary  to  analyze  a  particular  sequence  of  moves  because  information  is  changing,  or  new  options 
and  new  constraints  are  added  at  later  stages,  then  it  is  necessary  to  use  the  extensive  form  representation. 
Extensive  form  is  a  rooted  tree  graph  representation  composed  of  labeled  nodes  and  branches;  each  branch 
represents  the  alternative  actions  available  for  the  player  at  that  node  of  the  game.  Figure  2  illustrates  a 
simplified  poker  game  in  extensive  form.  Player  nodes  are  indicated  as  ovals  labeled  in  decimal  format 
with  the  player  number  before  the  decimal  and  the  player's  information  set  label  after  the  decimal.  Terminal 
nodes  are  indicated  by  rectangles  that  include  the  vector  of  payoff  values  for  the  players.  In  this  illustration, 
the  root  node  (0.0)  represents  the  50-50  chance  of  player  1  drawing  a  red  or  black  card.  When  player  one 
chooses  to  Raise  or  Fold,  he  has  knowledge  of  the  card  he  has  drawn  as  indicated  by  the  presence  of 
distinct  node  labels  1 .0  (information  set  0  player  1  has  drawn  a  red  card)  and  the  node  labeled  1 . 1  (player  1 
has  drawn  a  black  card).  Player  2  does  not  have  this  advantage,  as  indicated  by  the  linked  nodes  labeled  2.0 
and  must  choose  to  See  or  Pass  without  knowing  what  color  card  was  drawn  by  player  1.  Since  it  is 
possible  to  convert  from  extensive  form  to  the  equivalent  normal  form  all  of  the  algorithms  for  finding 
equilibria  can  be  applied  to  games  in  extensive  form  as  well. 
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Figure  2.  Extensive  Form  Representation 


We  will  not  describe  the  much  more  complicated  methods  for  finding  equilibria  in  multiplayer  games; 
algorithms  do  exist  for  multiplayer  zero-sum  and  non-zero  sum  games.  Research  and  development 
continues  on  various  approaches  to  efficiently  compute  equilibria  such  as  relaxation  algorithms^®  based  on 
the  Nikaido-Isoda  ,  or  Simplicial  subdivision  based  on  the  Lyapunov  function  or  Quantal  Response 
logistical  function.  Most  of  these  equilibrium  finding  algorithms  for  multi-player  games  have  been 
cataloged^'*’^^  and  there  are  freely  available  implementations  for  R&D.  The  Gambit^®  GUI  and  Gambit 
Control  Language  (GCL)  are  excellent  examples.  Risk  aversion  of  adversaries,  neutrals  and  friendly 
players  can  also  be  modeled  in  modern  game  theory.  Game  theory  is  a  mature  field  that  is  going  through  a 
renaissance  in  evolutionary^’  game  theory,  decision  theoretic  and  game  theoretic  agents^®. 


Relaxation  Algorithms  in  Finding  Nash  Equilibria,  Steffan  Benidge,  Jacek  B.  Krawczyk  (Working  Paper 
version  2.3)  Victoria  University  of  Wellington,  New  Zealand.  (Jacek.Krawczvk@vuw.ac.nz  ). 

Note  on  Noncooperative  Convex  Games,  Nikaido  Hukuhane,  Isoda  Kazuo,  Pacific  Journal  of 
Mathematics,  1955,  Vol.  5,  Supplement  1,  807-815. 

A  Lyapunov  Function  Function  for  Nash  Equilibria,  Richard  D.  McKelvey  (California  Institute  of 
Technology),  1991. 

Quantal  Response  Equilibria  for  Normal  Form  Games,  Richard  D.  McKelvey  and  Thomas  R.  Palfrey, 
Games  and  Economic  Behavior,  1995,VolumelO,  6-38. 

Computation  of  Equilibria  in  Finite  Games,  Richard  D.  McKelvey  (California  Institute  of  Technology) 
and  Andrew  McLennan  (University  of  Minnesota)  June  30,  1996. 

Representations  and  Solutions  for  Game  Theoretic  Problems,  Daphne  Koller  and  Avi  Pfeffer  (Stanford 
University)  April  16,  1997. 

GAMBIT,  Richard  D.  McKelvey,  et  al,  California  Institute  of  Technology,  1997.  Accessible  via 
http://hss.caltech.edu/~gambit/Gambit.html 

Evolutionary  Game  Theory  -  Jorgen  W.  Weibull  The  MIT  Press,  Cambridge,  MA  1995,  1996  ISBN  0- 
262-23181-6 

Proceedings  from  the  Workshop  on  Decision  Theoretic  and  Game  Theoretic  Agents,  Eifth  European 
Conference  on  Symbolic  and  Quantitative  Approaches  to  Reasoning  with  Uncertainty,  University  College, 
London,  UK  5  July  1999,  Simon  Parsons  and  Michael  J.  Wooldridge  (editors).  More  recently,  the  2"® 
Workshop  on  Decision  Theoretic  and  Game  Theoretic  Agents  was  held  on  7  July  2000  in  Boston,  MA. 
Proceedings  from  the  2"®  workshop,  edited  by  Simon  Parsons  and  Piotr  Gmytrasiewicz  may  be  found 
online  at  http://www.csc.liv.ac.uk/~sp/events/gtdt/gtdtOO/proc.html 
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Limitations  of  Classical  Game  Theory 

The  assumption  of  complete  information  is  probably  the  greatest  impediment  to  the  practical  application  of 
classical  game  theory.  An  asymmetric  information  game  where  players  have  incomplete  information  on 
either  the  payoffs  or  options  or  both  are  much  more  typical  of  the  real  world  situation.  In  1966,  R.  J. 
Aumann  and  M.  Maschler  introduced  games  of  incomplete  information.  By  1968,  John  C.  Harsanyi  had 
built  the  theoretical  foundations  used  in  modern  analysis  of  information  games  in  a  series  of  three  papers. 
Games  of  Incomplete  Information  Played  by  "Bayesian"  Players  (Parts  I,  II,  and  III)  The 
characterization  of  incomplete  information  is  interpreted  as  the  lack  of  full  information  in  terms  of  the 
normal  form  of  the  game  in  precisely  3  different  ways: 

1 .  Ignorance  of  the  physical  outcome  function  Y  of  the  game  which  specifies  the  physical 
result  of  each  tuple  of  strategies  available  to  the  N  players,  y  =  Y(si,  S2,  S3,  ...  sj 

2.  Ignorance  of  utility  functions  Uj  which  map  a  physical  outcome  y  to  the  utility  to  player  i 

3.  Ignorance  of  the  strategy  spaces,  Sj  (set  of  all  pure  and  mixed  strategies,  Oi  for  player  i) 
available  to  each  player  i. 

Incomplete  information  is  very  carefully  distinguished  from  imperfect  information.  Imperfect  information 
refers  to  certainty  about  the  history  of  the  game,  lack  of  perfect  recall  for  previous  moves  of  oneself,  other 
players  or  nature.  The  general  approach  to  analysis  of  games  of  incomplete  information,  which  we  shall  not 
detail  here,  is  transformational  in  nature.  Games  of  incomplete  information  are  transformed  into 
theoretically  equivalent  games  with  complete,  but  imperfect  information.  The  key  assumption  in  this 
approach  is  that  every  player  will  assign  a  subjective  probability  distribution  Pi  to  all  unknown  independent 
variables  (variables  not  dependent  on  the  player's  own  choice  of  strategy).  Every  player  will  try  to 
maximize  his  own  payoff  Ui  in  terms  of  his  Pi.  This  is  known  as  the  Bayesian  hypothesis. 

We  began  this  introductory  overview  with  the  assumption  that  a  game  will  involve  indefinitely  repeated 
plays  or  at  least  for  some  unknown  random  number  of  plays.  In  the  case  that  we  know  the  exact  number  of 
plays  or  stages  in  a  game  and  have  a  well  defined  goal  state,  dynamic  programming  is  a  candidate  method 
to  identify  the  best  sequences  of  plays.  Dynamic  programming  regresses  one  stage  at  a  time  from  a 
specification  of  the  goal  state  expanding  least  cost  transition  paths,  constructed  from  the  players'  options, 
until  the  initial  stage  is  reached. 

Another  challenge  is  of  course  computational  tractability,  a  problem  to  which  both  dynamic  programming 
and  the  classical  discreet  game  solution  methods  are  vulnerable.  This  challenge  arises  as  the  number  of 
players  in  the  game  increases,  or  their  average  number  of  potential  options  increase.  In  the  case  of  dynamic 
programming  the  problem  is  exponential  in  the  number  of  stages  to  be  regressed. 

We  also  assumed  in  our  opening  treatment  of  game  theory  that  the  payoff  function,  Ui,  is  constant.  How 
should  our  strategy  evolve  as  payoff  evolves?  Differential  game  theory'**’  introduces  state  variables  and 
replaces  actions  with  control  variables  and  a  set  of  kinematics  equations  that  link  a  player’s  control  variable 
settings  to  his  traversal  of  state  space.  A  single-player  differential  game  essentially  reduces  to  an  optimal 
control  problem.  Every  game  begins  with  each  player  in  some  initial  state  space  location.  The  play  of  the 
game  moves  the  players  through  state  space  according  to  their  control  strategy.  An  equilibrium  solution 
corresponds  to  an  optimal  solution  for  a  given  objective  function.  Eor  the  most  part  differential  game  theory 
is  practical  in  zero-sum  two-player  contexts.  An  optimal  control  problem  with  independently  steered  sets  of 
control  variables  is  a  good  analogy  for  differential  game  theory  except  multiple  players  give  rise  to 
equivocal  surfaces,  that  is  bifurcations  in  strategy  that  are  equally  effective. 


J.  C.  Harsanyi.  1967-68.  Games  of  Incomplete  Information  Played  by  Bayesian  Players  (I,  II,  and  III). 
Management  Science,  14,  pages  159-182,  320-334,  486-502. 

'**’  Differential  Games,  A  Mathematical  Theory  with  Applications  to  Warfare  and  Pursuit,  Control  and 
Optimization,  Rufus  Isaacs,  Dover  Publications,  Mineola,  New  York,  1999. 
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Modern  hybrids  such  as  Hyper-Gaming"^*’"^^  and  Decision  Theoretic  Planning'*^  offer  some  new  roads 
connecting  valuated  spaces  to  knowledge-intensive  frameworks  where  hierarchical  decomposition,  formal 
description  theories,  effects  based  planning  and  Markov  decision  processes  may  be  used  to  model  the 
necessary  dynamics  and  construct  the  details  of  a  needed  plan.  Hybrid  game  theoretic  extensions  leverage 
hierarchical  direction  of  search,  reasoning  or  planning  based  on  strategies  developed  within  the  game- 
decision  theoretic  domain  before  turning  over  control  to  the  lower  level  domain  specific  search  or  planning 
mechanisms. 

Linguistic  Geometry (LG)  is  yet  another  related  approach  to  construction  of  mathematical  models  for 
knowledge  representation  and  reasoning  about  large-scale  multi-agent  systems.  A  number  of  such  systems 
including  air/space  combat,  robotic  manufacturing,  software  re-engineering,  Internet  CyberWar,  etc.  can  be 
modeled  as  abstract  board  games.  These  are  multi-player  adversarial  games  whose  moves  can  be 
represented  by  means  of  moving  abstract  pieces  over  locations  of  an  abstract  board.  The  adversaries, 
dimensions  of  the  abstract  board,  mobility  of  pieces,  simultaneity  or  sequencing  of  moves  can  all  be 
tailored  to  the  situation.  The  purpose  of  LG  is  to  provide  strategies  to  guide  the  participants  of  a  game  to 
reach  their  global  goals.  Traditionally,  finding  such  strategies  required  searches  in  giant  search  trees.  LG 
dramatically  reduces  the  size  of  the  search  trees,  by  using  expert  heuristics  that  replace  search  by  capture  of 
emergent  strategies  from  modeled  agents  (bombers,  space  interceptors,  etc.)  pursuing  their  local  goals.  The 
formalized  expert  strategies  yield  efficient  algorithms  for  problem  settings  whose  dimensions  may  be 
significantly  greater  than  those  for  which  the  experts  developed  their  strategies.  Moreover,  these  formal 
strategies  allow  application  to  problem  domains  beyond  the  areas  originally  envisioned  by  the  experts.  To 
formalize  the  heuristics,  LG  employs  formal  linguistics  as  well  as  geometric  structures  over  the  abstract 
board  thus  it  was  named  Linguistic  Geometry. 


Game  theoretic  R&D  needs  for  Wargaming 

While  game  theory  has  a  mature  base  of  research  and  technical  progress  upon  which  to  build,  there  are 
some  particular  areas  in  need  of  development  if  game  theory  is  to  be  usefully  applied  as  a  tool  in 
wargaming  the  asymmetric  environment.  We  outline  four  areas,  there  are  undoubtedly  other  areas  as  well, 
but  progress  in  these  would  go  a  long  way  toward  the  realization  of  game  theoretic  wargaming. 

(1)  Synthesizing  the  game  from  the  situation  and  historical  data 

How  do  we  automatically  enumerate  relevant  players,  their  options,  and  estimated  payoffs?  (automatic 
pressure  point  analysis?)  Are  tools  such  as  Antecedent,  Behavior,  Consequent  (ABC)  databases,  text 
summarization  or  rapid  knowledge  bases  rich  enough  for  this  task?  How  do  we  interactively  combine 
expert  judgements  about  consequences  and  payoff?  Can  we  automatically  update  games  to  similar 
situations  thus  reusing  previous  expert  assessments  on  payoffs  and  previous  solution  strategies?  What  must 
we  monitor  in  order  to  determine  when  the  situation  has  changed  sufficiently  to  render  the  current  game 
invalid,  or  in  need  of  adaptation?  Can  game  updating  keep  up  with  typical  situation  dynamics? 


Merging  AI  Game  Theory  in  Multiagent  Planning,  Russell  Vane,  Paul  Lehner,  Kathryn  Laskey,  1990 
IEEE  Transactions,  853-857. 

Using  HyperGames  to  Select  Plans  in  Competitive  Environments,  Russell  Richardson  Vane  III, 
Dissertation  submitted  in  partial  fulfillment  of  Ph.D.  in  Information  Technology,  George  Mason 
University,  Spring  2000. 

Decision  Theoretic  Planning:  Structural  Assumptions  and  Computational  Leverage,  Craig  Boutilier, 
Thomas  Dean,  Steve  Hanks,  Journal  of  Artificial  Intelligence  Research  11,  July  1999. 

Linguistic  Geometry  Prom  Search  to  Construction,  Boris  Stillman,  Kluwer  Academic  Publishers,  2000, 
LC  QA76.9.C65  S76  2000  003'.3— dc2L  Per  personal  communication  with  the  author  a  number  of 
prototypes  of  LG  systems  and  commercial  products  have  been  developed  at  Lockheed  Martin  Corp.,  GIS 
Solutions,  Sandia  National  Laboratories,  US  Air  Eorce  Phillips  Laboratory,  University  of  Denver,  and 
University  of  Colorado  at  Denver.  A  team  of  3  universities  (Wayne  State,  Cornell,  and  Univ.  of  Colorado  at 
Denver)  led  by  Rockwell  Science  Center  is  applying  LG  in  the  Joint  Eorce  Air  Component  Commander 
(JEACC)  Project  Ag/te  Symbolic  Mission  Control  and  Hostile  Counteraction  for  DARPA. 
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(2)  Finding  and  applying  optimal  strategies 

Techniques  exist  for  finding  some  equilibria  for  multi-player  games'*^,  but  under  what  conditions  can  we  or 
must  we  find  all  equilibria?  How  do  static  equilibria  compare  to  dynamic  equilibria  arrived  at  by  learning 
agents  of  limited  capability  and  intelligence  (bounded  rationality)?  How  can  efficiency  of  equilibrium 
finding  algorithms  be  improved  and  under  what  conditions?  It  has  been  suggested  that  efficient 
implementation  of  the  Lemke-Howson  algorithm  for  multiple  player  games  (n  >  2)  is  possible,  (GAMBIT 
has  implemented  this  for  two  player  games);  this  and/or  an  efficient  implementation  of  the  Lyaponuv 
approach  would  be  good  targets  for  development  of  more  complete  strategy  formation.  How  can  we  use 
expert  knowledge  of  psychological  factors  of  personality  to  select  the  most  likely  strategies  of  other 
players?  (Degree  of  risk  aversion?) 

(3)  Directed  modification  of  the  game 

Given  that  some  equilibrium  solution  is  seen  to  hold  in  the  real  world  situation,  what  techniques  are 
available  for  us  to  optimally  adjust  payoffs  (when  this  is  possible  by  virtue  of  transferable  utility  or 
cooperation)  to  induce  a  transfer  to  another  more  politically  desirable  equilibrium?  Can  we  equivalently 
induce  such  transfer  via  strategy  adjustment?  How  do  we  systematically  valuate  induced  asymmetries  in 
the  information  sets  of  various  players  and  incorporate  such  options  in  our  strategy? 

(4)  Visualization  of  the  Game  Space 

We  need  some  techniques  for  making  the  situation  and  various  solutions  intuitive  and  palpable  to  the  non¬ 
technical  user.  Though  not  really  a  game  theoretic  challenge,  this  effort  would  free  the  user  from  the 
current  focus  on  the  physical  level  and  open  the  vistas  of  strategy  space  and  valuation  contours  by 
identifying  key  intuitive  2d  and  3d  projections  of  the  hyper-dimensional  game  environment.  An  example 
may  be  player  value  surfaces  as  functions  of  selected  or  especially  sensitive  strategy  option  variables. 


Grounding  Analysis  &  Models  in  Reality 

Given  a  game  theoretic  perspective,  how  do  we  connect  it  to  the  operant  reality  of  any  given  situation? 
Where  do  we  begin  in  the  process  of  formulating  players,  options,  and  payoffs?  Figure  3  illustrates  the 
concept  of  lifting  a  hypothetical  game  from  an  ABC  database  of  historical  events.  An  ABC  database,  as  its 
name  implies,  includes  selected  antecedents  to  historical  events,  behaviors  or  options  actually  executed  by 
the  collected  targets,  and  a  valuation  of  the  degree  of  success  or  value  achieved  by  the  target's  action 
{consequent)  for  a  given  set  of  antecedents  and  behaviors. 


Computation  of  Equilibria  in  Finite  Games,  Richard  D.  McKelvey  (rdm@hss.caltech.edu)  and  Andrew 
McLennan  (mclennan@wallev.econ.umn.edu)  June  30,  1996. 
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Virtual  Strategy  Vectors 


Figure  3.  Bootstrapping  the  Hypothetical  Game 


Figure  3  is  actually  a  high-level  indication  of  the  process  outlined  in  research  area  (1)  above.  A  detailed 
elaboration  is  beyond  the  scope  of  this  overview  paper,  but  the  fundamental  notion  is  that  de  facto  mixed 
strategy  vectors  are  implicit  in  the  ABC  history  for  each  target  or  player.  The  process  outlined  here  extracts 
the  implicit  strategy  vectors  and  incorporates  available  intelligence  on  player  ideology,  worldview,  beliefs, 
knowledge,  capabilities  and  objectives  to  generate  a  plausible  set  of  payoffs.  The  combination  of  implicit 
strategy  vectors,  plausible  payoff  matrix  and  individual  player  information  sets,  constitute  the  initial 
hypothetical  game.  Refinement  of  the  initial  hypothesis  could  be  directed  by  reduction  of  uncertainty  in 
payoff  and  information  estimates  and  options  available  to  players  over  time.  The  evolving  estimate  of  the 
game  would  in  turn  serve  as  the  analytical  basis  for  developing  our  own  strategy  (section  2  research 
techniques)  and  directing  modification  of  payoff  where  possible  in  accord  with  research  results  from 
section  (3)  above. 

Similar  work  in  machine  learning  has  been  performed  to  evolve  negotiation  strategies  in  three  party 
coalition  games'*®  and  to  evolve  neural  network  models  from  ABC  databases  to  predict  behaviors  from 
antecedents'*^.  The  advantages  that  hypothetical  game  estimation  offer  are  semantic  transparency  and  a 
theoretical  foundation  for  strategy  development.  The  values  assigned  to  the  estimated  game  payoffs  can  be 
inspected  and  provide  a  plausible  explanation  of  the  otherwise  apparently  irrational  behaviors  observed  in 
the  history  of  terrorism.  The  payoff  estimates  are  the  basis  for  motivation  for  the  players  and  how  we  may 
best  respond,  but  it  is  an  interesting  fact  that  equilibria  are  invariant  under  affine  translations  of  payoff 
values.  So  we  are  not  constrained  to  a  particular  absolute  set  of  payoff  values  to  get  a  particular  equilibrium 
behavior.  On  the  other  hand  when  using  a  game  theoretic  formulation  for  predictive  purposes,  we  have  the 
problem  of  selecting  the  most  probable  equilibrium  from  the  many  possible  equilibria.  The  field  of 
reinforcement  learning'*®  is  beginning  to  address  the  problem  of  independent  agents  converging  to  a 
common  equilibrium'*®. 


'*®  On  Automated  Discovery  of  Models  Using  Genetic  Programming  in  Game-Theoretic  Contexts,  Garett 
Dworman,  Steven  O.  Kimbrough,  and  James  D.  Laing,  Proceedings  of  the  28*  Annual  Hawaii  International 
Conference  on  System  Sciences  1995. 

'*’  Evolutionary  Models  of  Terrorist  Threat,  Final  Report  DI-MISC-8071 1  David  B.  Fogel,  Natural 
Selection  ,  Inc.  Sponsored  by  DARPA  on  ARPA  Order  D61 1/70  Issued  under  contract  no.  DAAHOl-OO-C- 
R044,  June  13,  2000. 

'*®  Reinforcement  Learning;  A  Tutorial,  Mance  E.  Harmon  and  Stephanie  S.  Harmon 
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Conclusion 


The  broad  and  dynamic  scope  of  asymmetric  war  demands  flexible  and  adaptive  wargaming  and  analytical 
tools  and  models;  tools  that  can  be  readily  applied  to  a  wide  variety  of  situations,  vulnerabilities  and 
threats.  The  purpose  of  the  Wargaming  the  Asymmetric  Environment  Program  is  to  develop  underlying 
technologies  and  prototypes  that  will  significantly  advance  US  preparation  for,  and  ability  to  respond  and 
pro-actively  mitigate  or  eliminate  asymmetric  threats. 

This  paper  outlined  what  is  understood  as  asymmetry  and  asymmetric  threat  and  outlined  a  game  theoretic 
perspective  on  supporting  strategy  development  and  C2  decision  support  in  the  asymmetric  environment.  It 
is  envisioned  that  game  theoretic  formulations,  analyses  and  solutions  would  greatly  benefit  next 
generation  wargames  particularly  in  the  coupling  of  strategic  effects  on  operational  outcomes  and 
development  of  effective  strategies  to  form  or  block  targeted  coalitions.  Evaluation  and  modeling  of 
multiple  game  equilibria  provides  a  theoretically  sound  basis  for  generating  rational  course  of  action 
exploration  either  for  off-line  analysis  or  in  automated  C2  decision  making  so  sorely  needed  in  wargame 
simulations. 

There  remain  many  unaddressed  and  unanswered  issues  that  I  did  not  have  time  to  explore  in  this  paper. 
Two  seem  most  pressing; 

1 .  Can  equilibrium  calculation  or  estimation  be  kept  practical  in  complex  games?  We  know  that  the 
number  of  all  possible  totally  mixed  equilibria  is  in  general  an  exponential  function  of  number  of 
players  and  actions  available  per  player.  How  do  these  multiple  equilibria  group,  disperse,  relate  on  the 
simplex? 

2.  How  should  we  profile  the  rationality  bounds  of  our  adversaries?  Even  if  we  are  able  to  estimate  their 
utility  function  how  do  we  characterize  their  own  cognizance  of  options,  their  ability  to  calculate  and 
follow  a  best  response  to  their  particular  circumstance. 

Developing  answers  to  these  and  other  issues  of  modeling  the  asymmetric  environment  await  future 
research. 


http://www-anw.cs.umass.edu/~mharmon/rltutorial/noframes.html 
Rational  Learning  of  Mixed  Equilibria  in  Stochastic  Games,  Michael  Bowling  and  Manuela  Veloso 
http://www.cs.cmu.edu/~mhb/papers/00-rational.ps.gz 
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